-----------------------------------------------------------------------------------------------------
name: <unnamed>
log: C:\Users\mexmi\Documents\newer web pages\soc_382_stuff\logs\loglin w 5by5 intermar.log
log type: text
opened on: 29 Jan 2019, 10:18:10
. table meth_num feth_num, contents (sum count) row col cellwidth(10)
--------------------------------------------------------------------------------------------
husband's | wife's race/ethnicity
race/ethnicity | Black, non Mexican Am Hispanic O Non Hispan White non Total
--------------------+-----------------------------------------------------------------------
Black, non Hispanic | 42521 291 412 393 2064 45681
Mexican American | 94 18088 612 433 6067 25294
Hispanic Other | 310 633 5901 258 4507 11609
Non Hispanic Other | 101 317 214 3509 3959 8100
White non Hispanic | 615 5338 4403 5505 543276 559137
|
Total | 43641 24667 11542 10098 559873 649821
--------------------------------------------------------------------------------------------
* The data represent national US marriage cross-classification data, over 3 censuses: 1970, 1980 and 1990. So the sample size is huge and the power to distinguish between competing hypotheses is very high.
1) The very unappealing constant only model.
. poisson count
Iteration 0: log likelihood = -1576481.7
Iteration 1: log likelihood = -1576481.7
Poisson regression Number of obs = 25
LR chi2(0) = -0.00
Prob > chi2 = .
Log likelihood = -1576481.7 Pseudo R2 = -0.0000
------------------------------------------------------------------------------
count | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | 10.16558 .0012405 8194.62 0.000 10.16315 10.16801
------------------------------------------------------------------------------
. display ln(649821/25)
10.165576
. predict constant_only_class
(option n assumed; predicted number of events)
. table meth_num feth_num, contents (sum count sum constant_only_class ) row col cellwidth(10)
--------------------------------------------------------------------------------------------
husband's | wife's race/ethnicity
race/ethnicity | Black, non Mexican Am Hispanic O Non Hispan White non Total
--------------------+-----------------------------------------------------------------------
Black, non Hispanic | 42521 291 412 393 2064 45681
| 25992.84 25992.84 25992.84 25992.84 25992.84 129964.2
|
Mexican American | 94 18088 612 433 6067 25294
| 25992.84 25992.84 25992.84 25992.84 25992.84 129964.2
|
Hispanic Other | 310 633 5901 258 4507 11609
| 25992.84 25992.84 25992.84 25992.84 25992.84 129964.2
|
Non Hispanic Other | 101 317 214 3509 3959 8100
| 25992.84 25992.84 25992.84 25992.84 25992.84 129964.2
|
White non Hispanic | 615 5338 4403 5505 543276 559137
| 25992.84 25992.84 25992.84 25992.84 25992.84 129964.2
|
Total | 43641 24667 11542 10098 559873 649821
| 129964.2 129964.2 129964.2 129964.2 129964.2 649821
--------------------------------------------------------------------------------------------
*Constant only model fits the data in only one place: the total count of 649K.
. poisgof
Deviance goodness-of-fit = 3152733
Prob > chi2(24) = 0.0000
Pearson goodness-of-fit = 1.08e+07
Prob > chi2(24) = 0.0000
* The constant only model fits terribly, as indicated by the fits next to the actual data above, and by the goodness of fit chisquare statistic. The null hypothesis here is that the constant model and the saturated model (i.e. the actual data) fit equally well. This null hypothesis is resoundingly rejected. Note also that the likelihood ratio chisquare test and the Pearson chisquare test yield the same substantive answer (rejection of the null hypothesis), but the actual statistics are 3X different. I would guess that the poor fit of the model makes one or both of the statistics perform poorly.
. gen ID_const_class=(50/649821)*(abs(count- constant_only_class ))
*The formula for generating cell-by-cell ID score: (50/N)*(abs(actual-predicted))
. table meth_num feth_num, contents (sum ID_const_class ) row col cellwidth(10)
--------------------------------------------------------------------------------------------
husband's | wife's race/ethnicity
race/ethnicity | Black, non Mexican Am Hispanic O Non Hispan White non Total
--------------------+-----------------------------------------------------------------------
Black, non Hispanic | 1.271747 1.977609 1.968299 1.969761 1.841187 9.028603
Mexican American | 1.992767 .6082321 1.95291 1.966683 1.533179 8.053772
Hispanic Other | 1.976147 1.951294 1.545952 1.980148 1.653212 9.106754
Non Hispanic Other | 1.992229 1.975609 1.983534 1.730003 1.695378 9.376751
White non Hispanic | 1.952679 1.589272 1.661214 1.576422 39.80197 46.58156
|
Total | 9.18557 8.102016 9.111909 9.223017 46.52493 82.14744
--------------------------------------------------------------------------------------------
*Tabling the score yields the ID statistic sum of 82, meaning 82% of the cases are misclassified, i.e. in the wrong cell.
. drop ID_indep ID_constant ID_dichot_endog ID_quasi_indep
2) The independence model:
. poisson count i.meth_num i.feth_num
Iteration 0: log likelihood = -300152.18
Iteration 1: log likelihood = -228156.87
Iteration 2: log likelihood = -225010.67
Iteration 3: log likelihood = -225002.47
Iteration 4: log likelihood = -225002.47
Poisson regression Number of obs = 25
LR chi2(8) = 2702958.36
Prob > chi2 = 0.0000
Log likelihood = -225002.47 Pseudo R2 = 0.8573
-------------------------------------------------------------------------------------
count | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------------+----------------------------------------------------------------
meth_num |
Mexican American | -.5911152 .0078375 -75.42 0.000 -.6064764 -.5757541
Hispanic Other | -1.369902 .0103938 -131.80 0.000 -1.390273 -1.34953
Non Hispanic Other | -1.729818 .012056 -143.48 0.000 -1.753448 -1.706189
White non Hispanic | 2.504712 .0048661 514.72 0.000 2.495175 2.51425
|
feth_num |
Mexican American | -.5705308 .0079658 -71.62 0.000 -.5861435 -.554918
Hispanic Other | -1.330005 .0104668 -127.07 0.000 -1.350519 -1.30949
Non Hispanic Other | -1.46366 .0110428 -132.54 0.000 -1.485303 -1.442016
White non Hispanic | 2.551713 .0049699 513.43 0.000 2.541972 2.561454
|
_cons | 8.028738 .0065777 1220.60 0.000 8.015846 8.04163
-------------------------------------------------------------------------------------
. poisgof
Deviance goodness-of-fit = 449774.9
Prob > chi2(16) = 0.0000
Pearson goodness-of-fit = 1176372
Prob > chi2(16) = 0.0000
. predict indep_model_class
(option n assumed; predicted number of events)
. table meth_num feth_num, contents (sum count sum indep_model_class ) row col cellwidth(10)
--------------------------------------------------------------------------------------------
husband's | wife's race/ethnicity
race/ethnicity | Black, non Mexican Am Hispanic O Non Hispan White non Total
--------------------+-----------------------------------------------------------------------
Black, non Hispanic | 42521 291 412 393 2064 45681
| 3067.867 1734.036 811.3774 709.8674 39357.85 45681
|
Mexican American | 94 18088 612 433 6067 25294
| 1698.707 960.1523 449.2673 393.0603 21792.81 25294
|
Hispanic Other | 310 633 5901 258 4507 11609
| 779.6429 440.674 206.1969 180.4 10002.09 11609
|
Non Hispanic Other | 101 317 214 3509 3959 8100
| 543.9838 307.4734 143.8707 125.8713 6978.801 8100
|
White non Hispanic | 615 5338 4403 5505 543276 559137
| 37550.8 21224.66 9931.288 8688.801 481741.4 559137
|
Total | 43641 24667 11542 10098 559873 649821
| 43641 24667 11542 10098 559873 649821
--------------------------------------------------------------------------------------------
* Independence model fits the marginals, that is the row and column totals, and therefore also the grand total.
. gen ID_indep_class=(50/649821)*(abs(count- indep_model_class ))
. table meth_num feth_num, contents (sum ID_indep_class ) row col cellwidth(10)
--------------------------------------------------------------------------------------------
husband's | wife's race/ethnicity
race/ethnicity | Black, non Mexican Am Hispanic O Non Hispan White non Total
--------------------+-----------------------------------------------------------------------
Black, non Hispanic | 3.035692 .1110334 .0307298 .0243811 2.869548 6.071385
Mexican American | .123473 1.31789 .0125213 .0030731 1.210011 2.666968
Hispanic Other | .0361363 .0147984 .4381824 .0059709 .4228154 .9179034
Non Hispanic Other | .0340851 .000733 .005396 .2603123 .2323564 .5328828
White non Hispanic | 2.841998 1.222388 .4253701 .2449752 4.734732 9.469463
|
Total | 6.071385 2.666842 .9121997 .5387127 9.469463 19.6586
--------------------------------------------------------------------------------------------
. gen byte endogamy_diagonal=0
. replace endogamy_diagonal=1 if meth_num== feth_num
(5 real changes made)
. table meth_num feth_num, contents (mean endogamy_diagonal ) cellwidth(10)
note: cellwidth too small, variable name truncated;
to increase cellwidth, specify cellwidth(#)
--------------------------------------------------------------------------------
husband's | wife's race/ethnicity
race/ethnicity | Black, non Mexican Am Hispanic O Non Hispan White non
--------------------+-----------------------------------------------------------
Black, non Hispanic | 1 0 0 0 0
Mexican American | 0 1 0 0 0
Hispanic Other | 0 0 1 0 0
Non Hispanic Other | 0 0 0 1 0
White non Hispanic | 0 0 0 0 1
--------------------------------------------------------------------------------
3) Adding one term for the endogamy diagonal
. poisson count i.meth_num i.feth_num endogamy_diagonal
Poisson regression Number of obs = 25
LR chi2(9) = 3129502.75
Prob > chi2 = 0.0000
Log likelihood = -11730.28 Pseudo R2 = 0.9926
-------------------------------------------------------------------------------------
count | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------------+----------------------------------------------------------------
meth_num |
Mexican American | -.391273 .011867 -32.97 0.000 -.4145319 -.368014
Hispanic Other | -.8877092 .0141065 -62.93 0.000 -.9153574 -.860061
Non Hispanic Other | -1.321856 .0156512 -84.46 0.000 -1.352532 -1.29118
White non Hispanic | 1.233632 .008495 145.22 0.000 1.216982 1.250282
|
feth_num |
Mexican American | -.2689479 .0120762 -22.27 0.000 -.2926167 -.245279
Hispanic Other | -.6892557 .0142621 -48.33 0.000 -.7172089 -.6613025
Non Hispanic Other | -.5729032 .0146367 -39.14 0.000 -.6015907 -.5442157
White non Hispanic | 1.467609 .0086021 170.61 0.000 1.450749 1.484469
|
endogamy_diagonal | 3.208822 .0058359 549.84 0.000 3.197384 3.22026
_cons | 7.298074 .0068811 1060.60 0.000 7.284587 7.311561
-------------------------------------------------------------------------------------
. display exp(3.2088)
24.749369
* How to interpret the coefficients. For the endogamy diagonal, having the same race as one’s spouse increases the log count by a highly significant 3.2. If we exponentiate, being on the endogamy diagonal increases the count by a factor of 24.7, or in Stata language by an Incident Rate Ratio of 24.7.
. poisgof
Deviance goodness-of-fit = 23230.56
Prob > chi2(15) = 0.0000
Pearson goodness-of-fit = 20603.22
Prob > chi2(15) = 0.0000
. gen byte endogamy_diagonal_cat=0
. replace endogamy_diagonal_cat= meth_num if meth_num== feth_num
(5 real changes made)
. predict endogamy_dichotomous
(option n assumed; predicted number of events)
. gen ID_endogamy_dichotomous=(50/649821)*(abs(count- endogamy_dichotomous ))
. table meth_num feth_num, contents (sum ID_endogamy_dichotomous ) row col cellwidth(10)
--------------------------------------------------------------------------------------------
husband's | wife's race/ethnicity
race/ethnicity | Black, non Mexican Am Hispanic O Non Hispan White non Total
--------------------+-----------------------------------------------------------------------
Black, non Hispanic | .4581418 .0644826 .0253613 .0338643 .3344336 .9162835
Mexican American | .0696381 .0621266 .0085046 .0100295 .1332898 .2835886
Hispanic Other | .0229383 .0129488 .1272461 .0065332 .1437687 .3134351
Non Hispanic Other | .0225406 .0012274 .001251 .1530409 .1731031 .351163
White non Hispanic | .3430248 .1124331 .1428519 .2034678 .115729 .9175065
|
Total | .9162836 .2532186 .3052149 .4069357 .9003242 2.781977
--------------------------------------------------------------------------------------------
*Fitting better by ID now, only misclassifying 2.8% of all cases.
4) Quasi-Independence, or independence plus a separate term for each cell on the endogamy diagonal.
. poisson count i.meth_num i.feth_num i.endogamy_diagonal_cat
Poisson regression Number of obs = 25
LR chi2(13) = 3151666.22
Prob > chi2 = 0.0000
Log likelihood = -648.54411 Pseudo R2 = 0.9996
---------------------------------------------------------------------------------------
count | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------------+----------------------------------------------------------------
meth_num |
Mexican American | .899968 .0213908 42.07 0.000 .8580427 .9418932
Hispanic Other | .6519274 .0222121 29.35 0.000 .6083924 .6954623
Non Hispanic Other | .4459202 .0231596 19.25 0.000 .4005282 .4913121
White non Hispanic | 2.971213 .0235674 126.07 0.000 2.925022 3.017404
|
feth_num |
Mexican American | 1.829598 .032363 56.53 0.000 1.766168 1.893028
Hispanic Other | 1.653511 .0327399 50.50 0.000 1.589342 1.71768
Non Hispanic Other | 1.794394 .0323429 55.48 0.000 1.731004 1.857785
White non Hispanic | 3.995454 .0336881 118.60 0.000 3.929427 4.061482
|
endogamy_diagonal_cat |
1 | 6.873631 .0370393 185.58 0.000 6.801036 6.946227
2 | 3.289316 .0229991 143.02 0.000 3.244239 3.334393
3 | 2.593316 .0262895 98.64 0.000 2.54179 2.644843
4 | 2.13865 .0286843 74.56 0.000 2.08243 2.19487
5 | 2.454583 .0192917 127.24 0.000 2.416772 2.492394
|
_cons | 3.784122 .0367204 103.05 0.000 3.712151 3.856093
---------------------------------------------------------------------------------------
* Note how different the endogamy diagonal terms are. Non-Hispanic blacks (category 1) are the most endogamous, the most likely to be married to someone from the same group. Below we test the difference between two of the above endogamy diagonal terms, and the difference is highly significant, which it should be. Adding 4 terms to model 3 improved our goodness of fit by 22,000 on 4 df.
. poisgof
Deviance goodness-of-fit = 1067.088
Prob > chi2(11) = 0.0000
Pearson goodness-of-fit = 1294.682
Prob > chi2(11) = 0.0000
. table meth_num feth_num, contents (mean endogamy_diagonal_cat ) cellwidth(10)
note: cellwidth too small, variable name truncated;
to increase cellwidth, specify cellwidth(#)
--------------------------------------------------------------------------------
husband's | wife's race/ethnicity
race/ethnicity | Black, non Mexican Am Hispanic O Non Hispan White non
--------------------+-----------------------------------------------------------
Black, non Hispanic | 1 0 0 0 0
Mexican American | 0 2 0 0 0
Hispanic Other | 0 0 3 0 0
Non Hispanic Other | 0 0 0 4 0
White non Hispanic | 0 0 0 0 5
--------------------------------------------------------------------------------
. codebook meth_num
-----------------------------------------------------------------------------------------------------
meth_num husband's race/ethnicity
-----------------------------------------------------------------------------------------------------
type: numeric (byte)
label: ethnicity
range: [1,5] units: 1
unique values: 5 missing .: 0/25
tabulation: Freq. Numeric Label
5 1 Black, non Hispanic
5 2 Mexican American
5 3 Hispanic Other
5 4 Non Hispanic Other
5 5 White non Hispanic
. test 2.endogamy_diagonal_cat-5.endogamy_diagonal_cat=0
( 1) [count]2.endogamy_diagonal_cat - [count]5.endogamy_diagonal_cat = 0
chi2( 1) = 492.45
Prob > chi2 = 0.0000
. lincom 2.endogamy_diagonal_cat-5.endogamy_diagonal_cat
( 1) [count]2.endogamy_diagonal_cat - [count]5.endogamy_diagonal_cat = 0
------------------------------------------------------------------------------
count | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | .8347328 .0376155 22.19 0.000 .7610076 .9084579
------------------------------------------------------------------------------
. predict quasi_indep_model
(option n assumed; predicted number of events)
. gen ID_quasi_indep=(50/649821)*(abs(count- quasi_indep_model ))
. table meth_num feth_num, contents (sum ID_quasi_indep ) row col cellwidth(10)
--------------------------------------------------------------------------------------------
husband's | wife's race/ethnicity
race/ethnicity | Black, non Mexican Am Hispanic O Non Hispan White non Total
--------------------+-----------------------------------------------------------------------
Black, non Hispanic | 0 .0012956 .0140117 .0098736 .0251809 .0503618
Mexican American | .0010935 0 .0035827 .0167726 .0142835 .0357322
Hispanic Other | .0173555 .008219 0 .0192346 .0063399 .051149
Non Hispanic Other | .0024838 .0085578 .0111633 0 .0172374 .0394423
White non Hispanic | .0187457 .0009568 .0064311 .0261336 0 .0522672
|
Total | .0396785 .0190292 .0351888 .0720144 .0630417 .2289526
--------------------------------------------------------------------------------------------
* Now we are down to an ID of 0.2% This model fits very well, but because sample size is so large, the likelihood ratio test for goodness of fit still rejects it. We need more terms to fit the off diagonal cells.
. table meth_num feth_num, contents (sum count sum quasi_indep_model ) row col cellwidth(10)
--------------------------------------------------------------------------------------------
husband's | wife's race/ethnicity
race/ethnicity | Black, non Mexican Am Hispanic O Non Hispan White non Total
--------------------+-----------------------------------------------------------------------
Black, non Hispanic | 42521 291 412 393 2064 45681
| 42521 274.1622 229.8974 264.6786 2391.262 45681
|
Mexican American | 94 18088 612 433 6067 25294
| 108.2118 18088 565.4383 650.9836 5881.366 25294
|
Hispanic Other | 310 633 5901 258 4507 11609
| 84.44069 526.1821 5901 507.9809 4589.396 11609
|
Non Hispanic Other | 101 317 214 3509 3959 8100
| 68.72013 428.2213 359.0829 3509 3734.976 8100
|
White non Hispanic | 615 5338 4403 5505 543276 559137
| 858.6274 5350.435 4486.582 5165.357 543276 559137
|
Total | 43641 24667 11542 10098 559873 649821
| 43641 24667 11542 10098 559873 649821
--------------------------------------------------------------------------------------------
. save "C:\Users\mexmi\Downloads\five cat intermar data US 3 decades.dta", replace
file C:\Users\mexmi\Downloads\five cat intermar data US 3 decades.dta saved
. log close
name: <unnamed>
log: C:\Users\mexmi\Documents\newer web pages\soc_382_stuff\logs\loglin w 5by5 intermar.log
log type: text
closed on: 29 Jan 2019, 16:40:04
-----------------------------------------------------------------------------------------------------